135 research outputs found
Character-Level Incremental Speech Recognition with Recurrent Neural Networks
In real-time speech recognition applications, the latency is an important
issue. We have developed a character-level incremental speech recognition (ISR)
system that responds quickly even during the speech, where the hypotheses are
gradually improved while the speaking proceeds. The algorithm employs a
speech-to-character unidirectional recurrent neural network (RNN), which is
end-to-end trained with connectionist temporal classification (CTC), and an
RNN-based character-level language model (LM). The output values of the
CTC-trained RNN are character-level probabilities, which are processed by beam
search decoding. The RNN LM augments the decoding by providing long-term
dependency information. We propose tree-based online beam search with
additional depth-pruning, which enables the system to process infinitely long
input speech with low latency. This system not only responds quickly on speech
but also can dictate out-of-vocabulary (OOV) words according to pronunciation.
The proposed model achieves the word error rate (WER) of 8.90% on the Wall
Street Journal (WSJ) Nov'92 20K evaluation set when trained on the WSJ SI-284
training set.Comment: To appear in ICASSP 201
Single stream parallelization of generalized LSTM-like RNNs on a GPU
Recurrent neural networks (RNNs) have shown outstanding performance on
processing sequence data. However, they suffer from long training time, which
demands parallel implementations of the training procedure. Parallelization of
the training algorithms for RNNs are very challenging because internal
recurrent paths form dependencies between two different time frames. In this
paper, we first propose a generalized graph-based RNN structure that covers the
most popular long short-term memory (LSTM) network. Then, we present a
parallelization approach that automatically explores parallelisms of arbitrary
RNNs by analyzing the graph structure. The experimental results show that the
proposed approach shows great speed-up even with a single training stream, and
further accelerates the training when combined with multiple parallel training
streams.Comment: Accepted by the 40th IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 201
Fixed-Point Performance Analysis of Recurrent Neural Networks
Recurrent neural networks have shown excellent performance in many
applications, however they require increased complexity in hardware or software
based implementations. The hardware complexity can be much lowered by
minimizing the word-length of weights and signals. This work analyzes the
fixed-point performance of recurrent neural networks using a retrain based
quantization method. The quantization sensitivity of each layer in RNNs is
studied, and the overall fixed-point optimization results minimizing the
capacity of weights while not sacrificing the performance are presented. A
language model and a phoneme recognition examples are used
- …